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We devise and explore an iterative optimization procedure for controlling particle populations in particle-in¬ 
cell (PIC) codes via merging and splitting of computational macro-particles. Our approach, is to compute 
an optimal representation of the global particle phase space structure while decreasing or increasing the 
entire particle population, based on k-means clustering of the data. In essence the procedure amounts to 
merging or splitting particles by statistical means, throughout the entire simulation volume in question, while 
minimizing a 6-dimensional total distance measure to preserve the physics. Particle merging is by far the 
most demanding procedure when considering conservation laws of physics; it amounts to lossy compression 
of particle phase space data. We demonstrate that our k-means approach conserves energy and momentum 
to high accuracy, even for high compression ratios, IZ « 3 — i.e., Nf < 0.33A^. Interestingly, we find 
that an accurate particle splitting step can be performed using k-means as well; this from an argument of 
symmetry. The split solution, using k-means, places splitted particles optimally, to obtain maximal spanning 
on the phase space manifold. Implementation and testing is done using an electromagnetic PIC code, the 
Photon-Plasma code. Nonetheless, the k-means framework is general; it is not limited to Vlasov-Maxwell 
type PIC codes. We discuss advantages and drawbacks of this optimal phase space reconstruction. 
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I. INTRODUCTION 

Control of computational macro-particle (CMP) pop¬ 
ulations in Particle-In-Cell (PIC) codes is particularly 
desirable in at least two situations: 

Population Runaway: Monte Carlo realizations of col- 
lisional processes in PIC codes, for example, of¬ 
ten involves fractionation of CMPs into ’’parents” 
and ’’children” for enhanced statistical resolution of 
the collision processes. This results in explosion of 
CMP populations, and a memory bounded simula¬ 
tion longevity. 

Load balancing: in PIC codes relies on the ability to 
redistribute CMPs among computational processes 
(e.g. in MPI domain decomposed models) at run¬ 
time to maintain similar execution times of the 
computational processes, and preserve statistical 
resolution of continuous phase space. 

CMP de-population (re-population) of domains that 
are progressively filled (depleted) can be achieved 
through deletion (addition) of CMPs - for those do¬ 
mains which are oversampled (undersampled), while 
attempting to maintain physical quantities locally 
conserved. Single particle deletion (addition) CMPs 
is detrimental with respect to the conservation of the 
physical properties of the system being modelecf 6 10 12 1 ®. 
It is necessary to merge (split) several CMPs to conserve 
both momentum and energy from the phase space 
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information available. 

An algorithm that can achieve this goal in a robust and 
efficient manner will benefit a wide range of problems 
in laboratory and astrophysical settings. Many physical 
processes naturally lead to runaway CMP populations 
(time domain), and extreme CMP concentrations (spatial 
domain), e.g. 

Load: High-intensity laser-plasma wakefield acceleration 
of electrons, Beck^ (also Figure [l]). 

Runaway: Gamma-Ray Burst wakefield plasma accel¬ 
eration, under the influence of detailed Compton 
scattering, Frederiksen 4 . 

Load: Streaming instabilities and agglomeration of plan- 
etesimals leading to planet formation, Johansen 
and Youdin 8 . 

Runaway & load: High-energy radiative processes and 
pair cascades in pulsar magnetospheres, Timokhin 
and Arons 27 . 

Load: Streams and caustics in the evolution of dark mat¬ 
ter structures in cosmological simulations, Vogels- 
berger and White^. 

All these cases (and many others) demand an efficient 
CMP population control and/or redistribution in large- 
scale numerical simulations. 

Several strategies for CMP merging have been visited in 
the literature over the last few decades, changing in order 
of complexity, cost and accuracy. Lapenta and Brack- 
bill^, also later Lapenta 10 and Teunissen and Ebert^, 
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FIG. 1. Early stage in ultra-high intensity laser pulse inter¬ 
acting with a quiescent homogenous plasma plume, showing 
electron CMP number density (colors not to scale). A bub¬ 
ble (dark central region), evacuated of electrons, is created 
by the highly non-linear disturbance from the laser field pulse 
(green, right), which is propagating to right. A hot-spot (yel¬ 
low, left) in the wake of the laser pulls electrons along, at 
close to the speed of light. The highly inhomogenous density, 
ranging from N e , min w 1 to iV e , max ~ 300 severely affects load 
balancing. Our method can alleviate this problem to speed 
up the simulation by a significant factor. 


considered the problem of merging/splitting on a sin¬ 
gle particle basis, e.g., 2 «->> 1, 3 2, and cell-based 

N ce ii M ce u approaches, (Lapenta and Brackbill^), 
with N and M small. More recently, more complex al¬ 
gorithms have emerged such as agglomerate clustering 
and resampling, and also oct-tree reconstructiorl 17 in mo¬ 
mentum space. 

Commonly, those previous strategies used means of al¬ 
gebraic reconstruction to ensure that physical field quan¬ 
tities, represented on the PIC discrete mesh (in r-space) 
would be conserved exactly. Some were investigated in 
reduced-dimensional systems, e.g. 1D3V (Martin and 
Cambierl^), although their method was not strictly con¬ 
strained to ID. Others further applied a reconstruction 
procedure which decomposed 6D phase space, /(r,p, t), 
in to 3D subspaces, / r (r,£) and f p (p, £), employing 
strict algebraic reconstruction on r-space, while retaining 
the solution found by agglomerate clustering in p-space 
(Grasso et al. 6 ). Any decomposition of phase space, M D , 
into phase subspaces M B and M c , with B+C=D (for our 
case D=6), removes information contained in possible 
cross-correlation between the subspaces. It is conceiv¬ 
able that such correlations should be preserved. 

In a view alternative to previous strategies, we 
consider the problem of reducing (increasing) particle 
phase space resolution by merging (splitting) CMPs, as 
an optimization problem in 6 dimensions. Our approach 
randomly selects existing particles as a global best 
guess at a solution for the clustering, with the objective 
to either merge or split them into a new imitative 
set of particles. Subsequently, a K-means iterative 
minimization of a global intra-cluster distance measure 
successively drives the merged (split) solution towards 
a reduced (increased) CMP population, with the same 
physical properties. 


representation. We then describe the details of our global 
k-means procedure; initialization, distance measure, par¬ 
ticle merging and splitting, as well a crude, yet impor¬ 
tant, edge preserving measure to circumvent k-means ar¬ 


outlines our test simulation setup and presents a few cru¬ 
cial tests of our k-means clustering procedure. Discussion 
and conclusions are given in Section [TV| 


tifacts on bounded domain decompositions. Section III 


II. K-MEANS CLUSTERING IN THE PIC CODES 


Generally, in electromagnetic PIC codes, the source 
terms in Maxwell’s equations, p c (r,t) and J(r,_p, £), are 
constructed from interpolated accumulation of a large 
number of computational macro-particles (CMPs) onto 
a computational mesh. These CMPs are distributed in 
continuous real space and momentum space, and given a 
continuous weight to signify the particle statistical influ¬ 
ence. For very large numbers of CMPs, we can approxi¬ 
mately describe the computational plasma everywhere by 
a distribution function, /(r,p, t) = / s (r,p, £), here¬ 

after phase space density, where the subscript ’s’ denotes 
particle species. 

In the Photon-Plasma code 7 } for the most complete 
case of 3D3V simulations, the CMP is represented by 
a six-tuplet of real numbers, r = {r x ,r y ,r z ,p x ,p y ,p z }, 
which positions the particle in 6-dimensional phase space 
(the tilde signifies a 6D vector). Further each CMP is 
given a statistical weight, Wi , which dictates a relative 
strength of the particle with respect to either the num¬ 
ber of physical particles, or a scaled amount of physical 
particles. Relativistic momentum is p = m^(v)v, with 
7 = yj (1 — /3 2 ), p m f/c, and in the Photon-Plasma 
code we most naturally keep the CMPs’ relativistic 3- 
velocity, p/tuq. For example p z = v z (l — /3^) -1 / 2 with 
/3 Z = v\ /c 2 . This renders direct addition and subtraction 
of particle momenta, vectorially, physically meaningful. 
Consequently, we may view the particle ensemble phase 
space as a collection of points in 6-dimensional Euclid¬ 
ian affine space, with a well defined algebra consisting 
of addition (e.g. p z ^ + p z j = p z ,k), subtraction (e.g. 
xi — xj = Xk) and a distance measure, 

d 2 (ri, rj) = (xi - Xj) 2 + ... + (p Zti - Pzj) 2 - ( 1 ) 


Particles can then be vectorially added or subtracted, 
and we can find a distance between them in this affine 
space. We can also construct an arithmetic mean, or for 
weighted particles, a weighted arithmetic mean of any 
ensemble, or cluster center point, of particles 


~ Ei w i r i 
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In Section [TTJ we describe the natural relationship be- where now barred vectors, i.e. r, denotes cluster points, 
tween k-means clustering and the PIC code phase space 
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These simple facts form the basis of this paper and the 
justification of global k-means clustering as a way of op¬ 
timal phase space reconstruction in, for example, PIC 
codes. 


A. Weighted k-means clustering 

Multivariate, multidimensional, data can be analyzed 
and manipulated using vector compression. K-Means 
belongs to this general class of vector compression 
algorithms 15 , and can be used to either refine or coarsen 
multivariate data manifolds. In this article, the weighted 
k-meanpl objective is: from a set of M data points, 
{fi,..., ?m}, with weights {wi ,..., wm}, in D-dimensional 
space, M d , find K cluster centers, {rq,..., with 

weights also in M^, which partition the origi¬ 

nal data in the optimal way. This is defined as that par¬ 
titioning which minimizes the total global intra-cluster 
distance, 


min (Vtot) = min ( N N w i\\ r i ~ r j\\ 2 j i ( 3 ) 
V ® 1 Tier 3 J 

with fj defined as the j’th cluster center by equation [ 2 ] 
(left), respectively (right). 

We choose to work in this paper in normalized data space, 
such that {r x ,...,p z } -► {r x /L x , ...,p z /L pz }, where 
{L X : •••, L pz } = {max(r x )—min(r cc ),..., max(p^)—min(p^)}. 
We cannot a priori assume that certain directions in 
phase space are more important than others with respect 
to the physics, if we want the procedure to be generally 
applicable. 

We did test the k-means procedure also, using non- 
normalized data space, i.e. {r x , ...,p z }, to see how this 
would affect the merged solution, for the case of a thermal 
plasma. Significant differences were found between the 
solutions in the two very different representations of the 
phase space data. We give a few results, superficially, in 
the section on tests (below). 

The choice of a distance norm, and the choice of nor¬ 
malization, of the data space severely impacts the quality 
of the k-means solution. Our choice of normalized data 
spaces should be the general one, not to favor certain di¬ 
rections in phase space. However, it is beyond the scope 
of this paper to investigate the details of such choices, as 
concerning distance measures and normalization. 

In signal compression theory, the original data set to be 
compressed or inflated in k-means is often denoted ’train¬ 
ing vectors’ while the solution (the clustered data set) is 
called the ’codebook vectors’. We adopt this terminology 
henceforth. 

Finding the global minimum for any data set in higher 
dimensions in k-means is an NP-hard task. For given val¬ 
ues of M, K and D, the computational effort is approxi¬ 



FIG. 2. Illustration: finding the global minimum in k-means 
is NP-hard for D > 2 dimensions and K > 2. However, for 
approximate solutions, i.e. when finding only a sufficiently 
global minimum, heuristic algorithms converge quickly. Ar¬ 
bitrary data set generated in MATLAB® 


mately 0(M KDJrl logM ) which is intractable for almost 
any PIC code problem we want to consider. If we — on 
the other hand — accept the solution to be only approxi¬ 
mative we can find acceptable alternatives in finite time, 
and even quite fast. Equivalently, an approximate solu¬ 
tion amounts to a local minimum rather than the global 
minimum described by Equation [3|. 


1. K-Means Clustering, Lloyd- Forgy algorithm 


A variety of heuristic algorithms exist; commonly they 
use iterative processes to find a local minimum solution to 
Equation [3j The simplest brute force heuristic algorithm, 
which is also the most expensive, is Lloyd’s algorithm?^ 
Lloyd 14 with Forgy initial conditions Forgy^. We will 
use ’’Lloyd’s” algorithm and ”k-means” interchangably, 
even though the ”k-means” term and a more general 
treatment of vector quantization originates from Mac- 
Queenl 15 . Lloyd-Forgy, or k-means clustering optimiza¬ 
tion goes through three basic steps: 


1. Initial condition: a first guess as to a solution is 

made by placing the initial codebook vector set. 
Forgy’s method at random selects K training vec¬ 
tors as the initial codebook. This often (but not 
always) is better than for example choosing ran¬ 
dom points within the data space. 

2. Cluster assignment: training vectors are assigned 

each to their nearest codebook vector (cluster cen¬ 
ter). This is effectively a Voronoi tesselation step. 

3. codebook replacement: by calculating the 

weighted arithmetic mean (Equation [ 2 ]), based 
on within-cluster associated training vectors, 
new codebook centers are found to replace those 
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codebook vectors found in 2) during the previous 
iteration. 

Steps 2 and 3 are repeated until some defined convergence 
threshold is met; for example, as in this paper, when the 
ratio in total error (eqn. [3| between successive iterations 
changes by less than 1.09bis a common criterion. Figure[3] 
illustrates the algorithm for a two-dimensional case. 



FIG. 3. Weighted Lloyd-Forgy iterative clustering (”k- 
means”) in 2D. Panel 1 : initialize codebook (red disks) at 
randomly chosen training vectors (blue dots), and perform 
Voronoi tesselation. Panel 2: compute new weighted arith¬ 
metic means of training vectors within each Voronoi cell 
and re-define these means as the codebook vectors. Adjust 
weights. NB: some training vectors will migrate to other 
cells (green dots). Dashed lines represent previous iteration; 
Voronoi cell boundaries (red lines), cluster positions (red open 
circles) and migrated training vectors (green lines). Reitera¬ 
tion is performed until a convergence criterion is met. 


The effect of successively tessellating and cluster re- 
centering, respectively, reduces the computational effort 
to 0(M x K xDxi), with i the number of iterations to 
convergence. Nonetheless, even when employing Lloyd- 
Forgy, the computational expense becomes increasingly 
prohibitive for large values of M, K and D. Hence, we 
might expect to discard k-means as feasible for CMP 
merging/splitting in PIC codes, especially for global or 
semi-global simulation volumes. In this paper we demon¬ 
strate its feasibility in terms of physics, rather than con¬ 
sider computational feasibility. Elsewhere (Maly et al. 16 ) 
it is reported that by employing various accelerated par¬ 
titioning and distance calculations, or employing brute 
force GPGPU kernels, the running time is reduced to ac¬ 
ceptable levels, thus demonstrating its feasibility in terms 
of computation as well. 


2. Particle merging — employing k-means 

From the previous section, merging particles many-to- 
many, globally (or semi-globally) in the volume now be¬ 
comes obvious; after the k-means operation, the code¬ 


book will contain all the necessary phase space informa¬ 
tion needed to preserve the physics in the continued sim¬ 
ulation. 

We only need to delete the original particle data (the 
training vectors) and replace them with the new reduced 
particle data set (the codebook) 

{ri,...,r M }{ a ,tr} {ri,---,r K }{ a ,cb} > ( 4 ) 

while conserving total charge, globally, by preserving the 
total weight of the CMPs, pre- and post-compression: 

K M 

^2 W cb = ^2 W tr , (5) 

3 i 

One further constraint is 

N cl 

W c b ^ ^ Wt r y , (6) 

l 

for all N c i intra-cluster particles. Here ’cb’ (’tr’) denoting 
codebook (training) vectors, respectively, and ’s’ denot¬ 
ing species. 

Several schemes exploit the additive properties of 

phase space, and they can be classified according to the 
approaches mentioned in the Introduction. The advan¬ 
tage of a many-to-many (M K, M > K » 1) iter¬ 
ative optimizing approach, like ours, is that we do not 
have to consider specifically, nor analytically, conserva¬ 
tion of energy, momentum, space charge density, current 
density or any higher order moments of the distribution. 
Many degrees of freedom make it possible to satisfy con¬ 
servation laws of physics to high precision 20 . The quality 
of the iterated solution will however be practically con¬ 
strained by computational expense, and by demands on 
the number of particles in the simulation. 

Energy and momentum conservation 

In D dimensions, a particle has D degrees of freedom. 
Momentum conservation demands, then, D constraints 
and energy conservation an additional constraint, for a 
total of D + 1 constraints. Consequently, in any dimen¬ 
sionality D , when merging M particles into K particles, 
the resulting particle number (K) must be strictly larger 
than one, to supply the needed degrees of freedom for 
simultanous momentum and energy conservation. 

The fundamental cluster (unit cell) in a k-means solu¬ 
tion is a Voronoi-cell, by tesselation. A dual set exists, 
which is the Delaunay-triangulation — the Delaunay-cell. 
This dual relation is sketched^ in Figure [i] 

While it is clear that a single Voronoi-cell cannot con¬ 
serve both momentum and energy, simply because the 
single cluster particle does not have sufficient degrees 
of freedom^, it is equally clear from Figure [ 4 ] that the 
Delaunay-cell has sufficient degrees of freedom to provide 
simultaneous conservation of momentum and energy, to 
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FIG. 4. The Voronoi-/Delaunay-cell dual, and their interpre¬ 
tation in terms of training vector set, and code book vector 
solution. Blue dots: original training vectors, red crosses: so¬ 
lution (merged) code book vectors. Grey lines: Voronoi-cell 
boundaries, thick red lines: Delaunay-cell boundaries, green 
lines: individual Delaunay-cells’ share of training vector set 
enclosed in Delaunay-cell volume. Black dashed circles: inte¬ 
gration over shells around a Voronoi-cell center (cluster cen¬ 
ter) for energy conservation test. 


the highest possible accuracy. The Delaunay-cell, which 
is constituted by three (in 2D) cluster center particles 
(codebook vectors) carries the share of phase space infor¬ 
mation from their respective partial neighboring Voronoi- 
cells. 

Further, any Voronoi-cell center (cluster center) is the 
rest frame of any cell calculation, by k-means construc¬ 
tion, namely it is the weigthed arithmetic average of clus¬ 
ter members. Again — by construction — therefore its 
momentum vanishes in that (local) frame, 


The cluster rest mass, 

l 



therefore generally contains the error introduced by loos¬ 
ing the relativistic energy contribution — due to the 
merge — from the T cluster members, p^), under the 
square root in Equation^ 3 . 

Relativistic energy is conserved but not invariant, 
whereas rest mass is invariant but not conserved. We 
therefore interpret the error in relativistic energy, not 
as relativistic mass, and the total rest mass should not 
change. 

The contribution will be small since it is a local rest 
frame contribution, but it cannot be avoided when con¬ 
ducting lossy data compression (a merge); there will al¬ 
ways be an error term when merging particles. This error 
in rest frame energy is what the k-means algorithm min¬ 
imizes (in conjunction with spatial position). 

We can now concretely define what the convergence 
objective, Equation |3j means. For all Voronoi-cells, the 
integrated intra-cluster distance (squared) equals the 
error in relativistic energy associated with the local 
Voronoi-cell merge. The effects of merging particles 
into a cluster will always lead to energy loss (locally 
in that cell), but globally the error will become small 
because the loss of local momenta of the training vectors 
is counter-balanced by other clusters which carry part of 
the missing momentum and energy. 


We have supplied a demonstration of this convergence 
property in energy/momentum conservation, of the k- 
means solution, in Section [III A 


= ( 7 ) 

W cl t 

to convergence criterion accuracy. The Voronoi-cell clus¬ 
ter member particles (training vectors) all have non-zero 
momentum (therefore energy) in this restframe, i.e. 

Ed = -=2 = XA'a/p ? + m o ( c=1 )- (8) 

W ° l l l 

Merging the cluster members onto cluster centers will 
delete local Voronoi-cell information about the differen¬ 
tial energy contributions, namely the terms in the ex¬ 
pression above. The result is a loss of energy from the 
cluster particle — not a loss of rest mass (text below). 


3. Particle splitting 

An accurate method for splitting particles is also 
needed; when the particle number in a cell falls below 
some given threshold, an increase in phase space resolu¬ 
tion becomes imperative — even for physical reasons. 

The problem of placing a large number of splitted par¬ 
ticles in an optimal way, to make maximal use of the 
added CMP resolution in terms of information content, 
amounts to placing the splitted particles as far from their 
mother particles in phase space as permitted. This means 
much farther than a simple random splitting procedure 
as described above. We have sketched this situation in 
Figure |5j 

A number of possibilities exist, many similar in ap¬ 
proach, generally all based on a random or guided dis¬ 
placement in phase space of the mother and child parti¬ 
cles. We add in this paper our novel approach of placing 
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(’prime’ denotes perturbed particles, and S a 6D Gaus¬ 
sian random variable and of order 10 -2 ) energy and mo¬ 
mentum were conserved virtually to machine precision, 
with A E ~ 0( 10 -6 ), see also Figure 
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”On-manifold” splitting; k-means re-distribution 


FIG. 5. Simple split vs complex k-means based split, with 
re-distribution. Blue are the new particles, line designates 
boundary of a Voronoi cell. Particle size is arbitrary (not 
weighted in sketch). Left: The simple split with a Gaussian 
(or other random) re-distribution is rather local and adds lit¬ 
tle phase space resolution. Right: the more costly and com¬ 
plex k-means based splitting uses the phase space information 
already availble to place the newly splitted particles (code¬ 
book vectors) at maximal 6D distance from all other parti¬ 
cles. 


the newly split particles according a symmetry argument 

— exploiting symmetries in the k-means based method 

— to place the new particles according to the weighted k- 
means solution which works for splitting as well as merg¬ 
ing. 


”On-top” splitting; no re-distribution 


Interestingly, a complex k-means splitting procedure 
— based on an argument of symmetry with particle merg¬ 
ing — proves to perform almost to the same accuracy 
in terms of conservation properties (also Fig. 12, right 
panel). Naturally, it is severely more expensive in terms 
of computational effort and memory consumption, yet, 
it re-distributes the additional CMP volume optimally 
spread over phase space, with maximally obtainable dis¬ 
tance between the child (codebook vectors) and mother 
(training vectors) particles. We now describe this ap¬ 
proach to splitting, symmetric with the k-means merging 
scheme. 

We exploit now a symmetry of the k-means procedure 
to obtain an optimal re-distribution of split ’’child” par¬ 
ticles with maximal spread on the phase space manifold. 
Increasing statistical resolution by adding particles can 
be achieved — to the same numerical accuracy as merg¬ 
ing — by keeping all training vectors, and adding the 
codebook vectors 


The i ntuit ively simple ”on-top” splitting into many new 
particles^ 11 is fast, exact, and guarantees perfect en¬ 
ergy and momentum conservation at the time of split¬ 
ting. Particles initially follow identical trajectories, after 
a single ”on-top” split. Subsequently, as multiple splits 
are performed, particles attain unequal weights due to 
random selection of the initial codebook. Since a mother 
particle will carry a different weight than child and grand¬ 
child particles (which now instead carry equal weights), 
it will not in general - now - follow the same path any¬ 
more. As a result integration inaccuracies will develop, 
albeit extremely slowly. Such inaccuracies are generally 
ignorable over the course of an entire simulation. While 
a pure, exact, ”on-top” split is always superior in perfor¬ 
mance and conservation of physics, such a pure ”on-top” 
split, it does nothing to improve the fidelity of the sim¬ 
ulation, only, it adds particles with no additional infor¬ 
mation content — hardly a gain. 


”On-top” splitting; simple re-distribution 

A Gaussian (or even random) perturbation to particle 
pairs generated in the on-top split can safely be applied 
(e.gP 8 ), if it is desired to have particle pairs more quickly 
depart from exactly coinciding trajectories, for a coarse 
increase of phase space resolution. 

We also tested this idea and can confirm that, even for 
perturbations (post-split) as large as r[ 2 = ^ 1 , 2 +^• 7 * 1,2 , 


{t*1 , . . . , Vm }{s,tr} ^ {^* 1 5 • ■ • 5 1 5 • • • 5 ^K }{s,cfr} • 

( 10 ) 

Effectively, all new particles (codebook vectors), are 
placed precisely on the 6D phase space manifold, but 
at positions different from those of the original particles 
(training vectors), in 6D phase space. This amounts to a 
k-means phase space inflation of type M —» K M+iF, 
in terms of number of CMPs. 

The only difference from merging is that we need to re¬ 
distribute the total weight of the original particle data, 
on both training vectors and codebook vectors 

K M M 

E Web + E Wtr = E W 'tr > ( n ) 

j i i 

where now the ' (prime) denotes values before performing 
k-means. Practically, the re-distribution of weights in 
our k-means based splitting scheme is done by sharing 
the weights between training vectors and their associated 
codebook vector in proportion to the training vectors’ 
weights. The total amount of weight, w c b, given to a 
codebook (cluster center), is the average value of those 
intra-cluster training vectors, 

Web = £ (Wcl) — , (w c i) = ( 12 ) 

W c b 

is the mean weight in that cluster, and 

N cl 

w c i= 

1 


(13) 
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Energy balance, on —manifold split 



FIG. 6. Relative error in total energy for a single on-manifold 
split as function of the parameter £. For a split with £ = 0.25 
which corresponds to equal sharing of weight between training 
vectors (original set) and the codebook (splitted particles), 
for N c b = 1 /^N tr the relative error is less than 0.3%. For £ = 
0.001 — or 0.1% transfer of the total weight — the solution 
is still not as good as the Gaussian on-top split (not plotted 
here) in terms of energetics. The splitted particles are not 
contributing much to the dynamics, but they are still placed 
optimally under the objective of placing particles as far apart 
in phase space as possible. So, there is a trade-off between 
precision and weight transfer. 


and wi are the individual weights of the N c i training vec¬ 
tors in that cluster. £ is a free parameter to either make 
the cluster receive less (£ < 1) or more (£ > 1) weight 
from the cluster members. The error introduced by the 
on-manifold split is linearly proportional to £, as seen 
from Figure [6j 

Splitting is, like merging, done under constraints of the 
mock edge preserving scheme, described in Section |Tl A 4| 

Particle splitting, which could have been simply 0(K ), 
now becomes a rather expensive — as expensive as k- 
means for merging — 0(M x K x D x i) once again. 
But; when taking into account an accelerated k-means al¬ 
gorithm (Maly et alJ®), the computational feasibility of 
both splitting and merging is achieved; we can afford the 
extra care taken in undersampling (oversampling) par¬ 
ticles phase space, for minimally (maximally) decreased 
(increased) statistical resolution. 


”On-top” vs ”On-manifold” splitting — a check 


We checked the performance of the ”on-manifold” k- 
means based splitting/re-distribution scheme, and the 
simple ”on-top” splitting/re-distribution scheme (with 
Gaussian perturbation of order 1%), for a comparison 
of their ability to conserve energy over a series of splits. 
This test is comprised in the splitting stress test which is 
described further in Section [HI Dl 


In Figure 12 (left panel) we plot the ratio of globally 
integrated particle energy, i.e. 


Ekn 


E Ntr+cb 

i=l,s G,s 


E 


ref 


s P Ntr e - 

Z*li = l,s e *,S 


(14) 


with respect to a reference run where no splits are 
performed, as a function of time. All splitting methods 
conserve total energy to about 0(1O -3 ), for the duration 
of the test run (figure 12, right panel). 


In summary, this counter-intuitive ”On-manifold” ap¬ 
proach to splitting does perform quite well. In terms 
of reproducing and conserving the physics (only), it is 
competitive, even if it falls short w.r.t. computational 
effort and memory consumption. We demonstrate the 
effectiveness of the K-Means based splitting method in 
the following sections on tests. Even if it is compara¬ 
bly expensive it could be desirable to employ the more 
expensive scheme in certain situations where maximal in¬ 
formation content is to be extracted from the split parti¬ 
cles, in a generic PIC code context. However, it is beyond 
the scope of this article to investigate the specifics of all 
physical scenarios where a complex split might be worth¬ 
while choosing over a simple ”on-top” split. Here we have 
merely argued that the symmetric operation, using k- 
means for splitting, performs well while optimizing phase 
space spanning of the added statistical resolution (more 
particles). We also found for all other cases (not plot¬ 
ted here) that the ’on-manifold’ splitting deviates from 
reference slower than the ’on-top’ +Gaussian (1% dis¬ 
tortion in r) after many splits - which hints that care¬ 
ful placement of added particles suppresses noise, which 
then grows with a Lyapunov exponent smaller than that 
of Gaussian noisy ’on-top’ split. 

All tests throughout the remainder of this article (Sec¬ 
tion IIA 3| primarily) have been performed with the k- 
means splitting method. Detailed work on Arn-top” split¬ 
ting is wide spread across the literature^ 10 12 17 l 18 l An 
exhaustively detailed comparison study of ”on-top” vs 
”on-manifold” splitting is beyond the scope of the present 
article. 


4. Contractive artifacts of K-Means 

The convex hull of a k-means solution will always con¬ 
tract with respect to the original data set volume, i.e. 

f dSlf(r,t) < f dfl 1 f'(r,t) , (15) 

Jfl Jft' 

except for the trivial case (/' == /). Here L2, ft' G 
are the convex hull bounding surfaces of the CMP 
density distribution in D dimensions before and after k- 
means compression. We have sketched this — for PIC 
code applications undesirable property of k-means — in 
Figure [f] for D=2. 

For PIC codes which are parallelized over computa¬ 
tional processes via domain decomposition in real space 

















FIG. 7. Phase space volume contraction associated with k- 
means clustering in 2D. The convex portion of phase space 
(red perimeter line) spanned by the codebook clusters (red 
dots) will always be smaller than portion of phase space (blue 
perimeter line) spanned by the training vectors (blue dots). 
This effect is undesirable effect as it leads to edge depletion 
effects for p c (r) and J(r,p). 


(r = {r x , r y , r z }) this is problematic because a given 
volume will experience edge artifacts in charge density, 
p c (r), and current density, J(r), namely a reduction 
of particles’ contribution to those physical quantities. 
Domain decomposition is often employed in PIC codes, 
making such an edge preserving step indispensable. 

To alleviate this problem we devise a simple mock edge 
preserving correction scheme, which is illustrated in Fig¬ 
ure [9j The idea is simply to let the clustered codebook 
solution approach the original training vector set on the 
domain boundaries. The boundary thickness is presently 
defined as equal to two cells of width. Is this way we 
ensure that edges are left untouched. In terms of compu¬ 
tation, this leads to an extra iteration which we estimate 
at effort 0(M x K), thus not severe (yet not ignorable) in 
the total budget. Furthermore, the final number of code¬ 
book vectors will be slightly larger than the target value 
for large volumes and approach the original number of 
training vectors when the volume in question approaches 
PIC code cell size. 



FIG. 8. Our mock edge preserving procedure, for spatially 
domain decomposed simulations. The procedure is applied 
only in those dimensions, where the convexity of the k-means 
leads to systematic edge effects. Shown in the inset on the 
right are: kept/omitted codebook vectors in dark blue/light 
blue, kept/deleted training vectors in green/red. See main 
text for further explanation. 


This mock edge preserving procedure is simple; after 
having found a codebook solution on the entire domain 
(including the boundary region), the codebook vectors in 


the domain boundary are deleted, and the training vec¬ 
tors kept here instead. On the interior of the domain 
(excluding boundaries), the codebook is reduced if the 
cluster falls inside but has training vector members in 
the domain boundary. If a codebook vector resides on 
the interior and has all training vector members on the 
interior as well, the codebook is kept as-is and the train¬ 
ing vector members are deleted. 


Z-Pz phase sub-space, two-stream instability 



Z 


FIG. 9. 2-dimensional {Z, P z } phase subspace. The con¬ 
tractive artifact of k-means in principle demands edge cor¬ 
rection even in momentum space (red). One choice could be 
a contouring or swarm-based image segmentation procedure 
which could efficiently detect and maintain edges in momen¬ 
tum space. The real space boundaries (blue) are discussed in 
the section on our mock edge-correction for the (MPI) domain 
boundaries.1 

Another issues concerning the contractive ’’feature” of 
k-means pertains to contraction in momentum space. 
While the domain boundaries are well defined in real 
space, r, the clustering will contract momenta along ap¬ 
proximate boundary contours in momentum space. This 
is sketched in figure ?? where we see that a stream¬ 
ing instability has a very complicated, sometimes even 
non-continuous, structure with holes, islands and wavy 
bounds. A mock edge preserving scheme which could al¬ 
leviate momentum space contraction might be a contour¬ 
ing image segmentation procedure which could identify 
important contours in momentum space. This would sig¬ 
nificantly improve the overall ability to conserve not only 
charge distribution (real space egdes) but also energy and 
currents (momentum space ’’edges”). 

In Section [TTT| we demonstrate that, despite its sim¬ 
plicity this edge preserving scheme manages to suppress 
edge-effects introduced by the contractive artifacts of k- 
means clustering, significantly. This can be appreciated 
from Figure [Tl] 

III. PROOF OF CONCEPT: ’BARBARA’ TESTS 

We proceed to demonstrate k-means based merg¬ 
ing/splitting feasibility in terms of preservation of physics 
with heavily varying particle numbers. 






















9 


All tests in the remainder of this article have been per¬ 
formed using a slightly modified setup of a simple 2D3V 
relativistic two-stream simulation, used for tests of the 
Photon-Plasma code 7 . A relativistic neutral electron- 
ion beam is streming through a neutral electron-ion back¬ 
ground at Y bearn = 3, with density ratio n b /n bg = 1/3. 
The dynamics are thought to be of relevance in cases such 
as Gamma-Ray-Burst afterglow shocks in a circumburst 
medium 5 . 

Our reference ’barbara 5 case in the present paper has 
grid size N XjZ = 128, N y = 1, physical size L XjZ = 126 e = 
3Si, L y = 1.26 e = 0.3£i, rrii/m e = 16, beam Lorentz 
factor T b = 3, beam-to-background density n p /n b = 1/3, 
uj pe ,o ~ 12, S e = 0.0856, so S e /A x « 8.6. Time step 
At = 0.00391, t end = 10.0 « 120a;, 


pe 


30a;-/, N p = 30 


in the background and N b = 10 in the beam plasma per 
cell/species; a total of 80 particles/cell. 

The detailed reference simulation setup is not impor¬ 
tant for our tests; the only objective is to see how well we 
preserve the physics w.r.t. a reference case. Throughout 
this Section, the ’reference run’ denotes the instance of 
’barbara’ which is devoid of performing merging and 
splitting. 

Although our simulations are setup in a quasi-2D3V re¬ 
duced dimensionality, this does not influence our k-means 
tests; particles still have a single cell’s degree of freedom, 
even in the F-coordinate. When we take into account the 
normalization of data space (see Section IIA), we will a 
have a truly 3D3V phase space manifold to work with. 

The simulations were all done on a 4x4 MPI domain 
decomposed geometry using simply the MPI processes as 
our k-means spatial domains. Still, domain sizes are not 
limited in any way, except for a lower bound on volume 
of a few cells in each spatial dimension. This is because 
the edge preserving scheme will make the solution ap¬ 
proach the original phase space density for very small 
volumes of order a few cells, Vkmeans = kA x lA y mA z , 
where {fc, Z, m} —?► {1,1,1}. 

We have verified the binary authenticity of successive 
reference runs, and that runs of ’barbara 5 , using ac¬ 
tual merging/splitting, were also binarily identical to the 
reference run - up to the point of first k-means, of course. 


A. Basic tests: energy-momentum conservation 

To demonstrate the energy-momentum conservation 
property of the k-means merging scheme, as discussed 
in Section |HA~2l we conducted a simple shell-based en¬ 
ergy integration test. It shows that the error in energy 
is local to the cluster at the Voronoi-cell center; it does 
not influence the global conservation properties of the 
k-means solution. 

The test was done for four instances of ’barbara’ with 
varying particle number densities, at an early stage of 
evolution, when the background plasma can still be con¬ 
sidered thermal and uniform. We tested a single species 
(electrons) of the background plasma. 


Our test measures the difference in total energy be¬ 
tween a pre- and post-merged solution. We measure the 
error in energy as we integrate the particles’ energy inside 
successively expanding shells centered on the Voronoi-cell 
center, in real 3D space. These shells are indicated in 
Figure [4] (black dashed circles), where the plot now rep¬ 
resents the real space clustering, rather than momentum 
space or genera l phase space. According to our argument 
(Section |lI A 2 ) when considering a volume containing the 
surrounding Delaunay-cells on expanding shells, the error 
should become small. 


Shell seorch, energy convergence 



FIG. 10. Integration over shells of varying radius, around 
a Voronoi-cell center (cluster center) for energy conservation 
test, for four cases of initial particle densities. Upper : the 
interior of the domain (where no edge correction is applied. 
Lower : in the domain boundary, where edge correction is 
maximally applied. The edge correction provides slightly bet¬ 
ter conservation properties, of course, since the original solu¬ 
tion is kept on the boundary. For r s h ~ 5Ax the edge case is 
about 1% for all cases except case 4ppc (~5%). 


Indeed, the energy conservation converges as expected 
when the neigboring Voronoi-cells (clusters) are included 


in the integrated energy. This is shown in Figure 10 


where the energy ratio, E tot ^/E tot j is plotted for four 
cases of initial particle density, or ’particles-per-cell’ 
(’ppc’). The energy is conserved as soon as the surround¬ 
ing the dual set Delaunay-cells have been included in the 
energy calculation. This underlines the non-local nature 
of the solution, but also shows that the non-local error 
is actually rather almost-local, as it becomes small (and 
constant) at a radius of only 2 cell radii; the error is op¬ 
timally small. The error will always bounded from below 
by the number of particles, the compression ratio and the 
convergence criterion set for k-means (the minimizing of 
distance measure, eqn. [3|. 

The energy is conserved to about 5% (for 4 ppc) when 
the full nearest neighbor Delaunay-cells are enclosed in 
the integration shell, r s h ~ 2Ax. Energy is conserved 
slightly better for 10, 20, and 40 ppc at r s h ^ 2Ax, and at 
r s h ~ 4Ax, with values ranging from is 3%-l% for those 
latter three cases. The same conclusion is also reached for 
the momentum, conponent-wise. The same conclusion is 
reached for streaming particle species (which is just a 
similar k-means procedure in a mean Lorentz boosted 
frame). 

Energy and momentum conservation does, however, 
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become challenged on the k-means’ domain boundaries 
(MPI domain bounds). This is our motivation for devis¬ 
ing a mock edge preserving scheme (Section IIA 4 below). 

Our mock scheme is a quick solution, and can be fur¬ 
ther improved by using proper Delaunay-cell based selec¬ 
tion of particles. Possibly including a window function 
smoother than a step function filter towards the edge, 
e.g. with a discontinuous step at Ar could improve this 
edge correction even further. It is beyond the scope of 
this work to devise the perfect edge correction, but within 
scope to demonstrate the need for such a feature - and 
its ability to mitigate edge effects. 


B. Basic tests: mock edge preserving scheme 


As previously explained (section [il A4 ), the k-means 
procedure possesses an intrinsic and undesired property 
when it comes to preserving particle phase space struc¬ 
ture in PIC simulations. Since any calculation exploit¬ 
ing arithmetic means will produce volumes smaller than 
the original one, a domain decomposed PIC simulation 
will suffer boundary effects in the domain decomposition 
dimensions. For our case of the Photon-Plasma code, 
this is in p hase subspace r^D = { x , y , zj (see also Sec¬ 
tion IIA4). In fact, it will even suffer this constraint in 
the domain non-decomposed dimensions (here momen¬ 
tum phase subspace, r^v = {Px,Py,Pz})- 

The domain boundaries, in position (’’physical”) space, 
are completely predictable. It makes sense to counter¬ 
balance convexity issues of the k-means based procedure 
based on a spatial filtering of the real space coordinate 
boundaries. We regard r^D = { x ,y, z j as correctable. 
Momentum subspace boundaries, r^v = {Px,Py,Pz}, are 
regarded as non-correctable since in any trivial - they 
will generally not be regular. In the -direction, how¬ 
ever, things are not so predictable. We stress that we 
have not decoupled 6D phase space reconstruction by this 
procedure, only, we have ensured the convergence of the 
compressed (or inflated) k-means solution in a subset of 
dimensions. We have introduced no decoherence between 
position and momentum at all by this edge-preserving 
correction. 

Two tests were performed to check the mock edge¬ 
preserving scheme performance 


Thermal cases: with little or no bulk flow which leads 
to little or no replenishment of domain bound¬ 
aries, keeping the domain boundaries quasi-static 
in terms of phase space evolution. 

Streaming cases: the well evolved two-stream simula¬ 
tions would produce extreme replenishment of do¬ 
main boundaries which will test the scheme’s abil¬ 
ity to render advection across boundaries transpar¬ 
ent. 


These two extreme dynamics are often realized in PIC 
codes, even simultaneously. 
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FIG. 11. Test of the mock-up edge preserving procedure. 
(A) (B): a single merge is performed with the usual PIC 
integration cycle completely omitted; the pure effect of the 
merge scheme is captured. Panel A(B) shows the merge with¬ 
out (with) edge-preservation effort, respectively. (C) and (D): 
same, but now a single SPLIT is performed instead. 


The test results from the thermal case are shown in 
Figure [TTJ We performed a single merge (split) — and 
nothing else — using the ’raw’ non-corrected k-means al¬ 
gorithm, and a single merge (split) with our mock edge¬ 
preserving correction added to the k-means solver. The 
MPI domain boundaries are clearly visible. As expected 
the effect is more severe in the merging case since a split 
retains clusters with in-boundary particles whereas merg¬ 
ing removes them. Still, even for merging, our edge¬ 
preserving correction yields significantly improved con¬ 
servation properties. Not surprisingly both cases of split¬ 
ting method were very accurate, to better than 0.1% (also 
figure [l2]). Particle merging naturally showed a minor de¬ 
cay in the solution across the merge step since it is a case 
of lossy compression. Still, the edge-preserving scheme 
forces the correct solution when approaching the real 
phase subspace boundaries, i.e. f'(x,y,z) —> f(x,y,z) 
for {x,y,z} {x,7/,z} min;max . The solution following 
several integrations would not decay rapidly. This result 
was reinforced by the streaming test case showing no ap¬ 
preciable discontinuities and ability to smooth away the 
boundary ’shadows’ introduced by the contractive aspect 
of k-means arithmetic averaging. 


C. Basic tests: pure merging & splitting 

A stress test was performed to ascertain the quality 
and longevity of clustered solutions under what we de¬ 
fined as ’extreme’ conditions. Multiple merges (or splits), 
only, were performed successively until the solution were 
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no longer meaningful when comparing with the (con¬ 
stant particle number) reference run. After 400 itera¬ 
tions (33UJ-J-), we either only merged or only split the 
total particle number six times over the course of an ad¬ 
ditional 500 iterations (42 

The merging stress test successively attempted removal 
of 2/3 of the particles (Nf = A^/3), for all species, on 
all MPI processes. Thus, from the first merge to the 
last merge, the number of particles would be Nf/Ni = 
(1/3) 6 ~ 0.001, had the fraction been exactly 1 / 3 . How¬ 
ever, since we empl oyed the mock edge preserving scheme 
(see Section IIA 4) included which limits the boundaries 
to the true solution CMP number density, the remaining 
particle number fraction in each merging step was some¬ 
what higher than 1/3. In fact the final-to-initial CMP 
total particle number was rather Nf/Ni ~ 0.02. 

In a similar fashion, we conducted a k-means based 
splitting test (a less severe problem), which successively 
attempted to add 1/3 of the particles (Nf = 4A^/3), for 
all species, on all MPI processes, to the new solution. 
We emphasize our employment, here, of the ex pensiv e 
on-manifold k-means based splitting (see Section [Tl A3 ). 
Now, instead, the number of particles would have in¬ 
creased to Nf/Ni = (4/3) 6 ~ 5.6, had the added fraction 
been exactly 1/3. Again, the mock edge preserving 
scheme limited the true solution total number of CMPs 
to Nf/Ni ~ 4.7. 


Figure [T2] plots the energy conservation results for a 
selected species, for the merging/splitting stress tests 
(left/right panels). 

During the first merge which reduces the total parti¬ 
cle number to less than half, errors in total energy be¬ 
tween 1% and 3% are introduced by the merge (left panel, 
fig. 12). A higher number of particles per cell (’ppc’) 


leads to lower discrepancy although the variation is not 
a strong function of ’ppc’. Further, we find that the er¬ 
ror grows slowly with successive merges, to a maximum 
of 10%, only when the number of particles has been re¬ 
duced by about 98%. 


MtKUL 


b>HUI 



FIG. 12. Relative error for pure merging and splitting stress 
tests, with respect to a reference case. Left panel - merg¬ 
ing , varying particle number. Right panel - splitting , varying 
method. 


While the ”on-top” split is superior (right panel, 
fig. [T 2 ] ) , it does not give way for improved fidelity. The 
”on-manifold” has larger error initially (as large as 0.1% 


at times) but does provide a more full span of the 
phase space manifold. The test used £ = 0.25 (see also 
discussion of Equation 12). The Maxwellian perturbed 
split population does no better than the ”on-manifold”, 
in some cases worse, for many successive splits and 
iterations. 


The resulting evolutions of the merged and split cases 
are also worth comparing for a quantity which is solved 
for by particle-mesh interpolation and subsequent time 
integration, i.e. comparing Maxwell’s source terms, J 
and p c . In Figure 13 and Figure 14 we compare B y (x , z) 


for splitting (left panel), reference (middle panel) and 
merging (right panel) cases, for two splits (or merges) 
over 200 iterations, and four splits (or merges) over 400 
iterations, respectively. All three panels are contoured 
on the same color scale, in each of the figures separately; 
they are directly comparable. Structure, phase, spectral 
properties are affected severely only after 4 merges, which 
introduces noise due to the decrease in particle numbers 
by ^92%. 

Not surprisingly, the splitting produces an almost per¬ 
fect match with respect to the reference case for all cases. 
Also as expected, the merged simulation shows increased 
levels of Poisson noise in the field as the number of CMPs 
is reduced. Still, after having removed more than 75% of 
the original data set, and after several tens of iterations, 
the solution is still very good. Even after having removed 
more than 90% of the particles over an additional 200 it¬ 
erations, noise levels have risen considerably overall. The 
global evolution is well preserved. Although high-k noise 
has been introduced, global field structures are still well 
represented. We checked the Fourier spectra which show 
this behavior as well; high-k modes are rising during a 
merge stage, but the spectrum remains largely unaltered 
for low- and intermediate-k wavenumbers. 



FIG. 13. Comparison of B y (x,z) for a hard stress test of 
merge (right panel) and split (left panel) with the reference 
case (middle panel), at time t ~ 23cj“ e 1 . The number of 
particles has been merged (split) twice, into Nf/Ni ~ 0.24 
(Nf/Ni ~ 1.68) over the course of 200 iterations. 


After four merges, the total field energy has increased 
by rsj 5%, whereas the particle energy has decreased by 
the same amount. This effect is also briefly mentioned in 
Section [niEj where the total energy budget is plotted for 
a full ’barbara’ run, in Fig. We interpret this counter¬ 
balance between fields and particles, as an increase in 

















12 


fields’ noise on all scales due to the lower number of par¬ 
ticles after a merge, from grid scale to systemwide scale. 
In future work it should be investigated whether this ef¬ 
fect is robust; that would be a positive ’’feature” of the 
noise properties of the k-means global merging scheme. 



FIG. 14. Same as Figure [l3l but now, at time t ~ 32cj~ e 1 
. The total number of particles has been merged (split) into 
Nf/Ni « 0.07 ( Nf/Ni ~ 2.8) in four steps over the course of 
400 iterations. The stress test has severely increased noise in 
the merge case. 


This stress test qualitatively gauges the severity of 
compression parameters and resilience of compressed so¬ 
lutions. Much like image compression, we see that a re¬ 
duction works well even for very high compression ratios, 
like for example compressing bitmapped images (.bmp) 
to Joint Photographic Experts Group images (.jpg), al¬ 
though the compression algorithms for images are likely 
more sophisticated in the latter case. 

A first, back-of-the-envelope quantification of validity 
of compression solutions hints that we should not 
compress (merge) successively more than 2 or 3 times 
without replenishing the CMP ensemble (either by 
advection or by production through collisions or simple 
statistical splitting). Furthermore, the compression 
ratio should not exceed about Timerge = Nf/Ni ~ 2/3 
and not be performed successively within less than 
about a plasma-oscillation (10-20 iterations here). This 
estimate is based on empirical evaluation, and should be 
subjected to refined studies and analysis. 


D. Basic tests: k-means, thermal distribution 

To test the k-means procedure’s ability to retain a ther¬ 
mal ensemble, an alternating splitting-and-merging tests 
was performed. Using the ’barbara’ setup, with r& = 1 
(no beam), we alternated between splitting and merg¬ 
ing for 39 time steps, with At = 0, for a total of 19 splits 
and 20 merges, interwined.6 tests were run, with temper¬ 
atures T G [10 -1 ,..., 10 -6 ] (natural units, with c = 1) 
to check for sensitivity to momentum magnitude of the 
ensemble. 

We plo t the result concerning energy conservation in 
figure |l5| for the case T e = T % = 10 -5 (ks = 1, c = 1), 
calculated in nomalized data space (left panel). For ref¬ 
erence, we plot also the energy ’’conservation” when cal- 
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FIG. 15. Energy conservation thermal tests, plotted for 
species 0. Successive alternating merging (splitting) 20 (19) 
times, for a total of 39 k-means operations. Shown are cases 
comparing non-normalized and normalized data (see also sec¬ 
tion HA] All other species show the same behavior. 


culating k-means in non-normalized data space (right 
panel), which underlines our claim that care should be 
taken not to ’’disfavor” (or underweigh) any dimension(s) 
in phase space, thus warranting our statement about nor¬ 
malizing the data. The plot also shows that merging 
conserves energy worse than splitting, as expected. 

The thermal successive split/merge test employed the 
complex ’on-manifold’ splitting scheme. Better energy 
conservation could be expected with a simple ’on-top’ 
split since particles would simply cluster and re-fragment 
with no gain or loss in fidelity - thus being less useful. 


E. Full scale automated merge/split test 


We conducted two tests with the full scale automated 
MPI-domain based merging-splitting activated: a ’’wide” 
and ’’narrow” tolerance range test would decide how well, 
and how often splitting and merging should be employed. 
Again, the test is conducted on the ’barbara’ case from 
above. 


’’Wide”: tolerance yielded splitting when, for any MPI 
domain, N p < Ni ow = 0.667N opt and merging when 
N p > N high = 1.333 N opt 

’’Narrow”: tolerance yielded splitting when, for any 
MPI domain, N p < Ni ow = 0.9 N opt and merging 
when N p > N high = l.\N opt . 


Due to the more restricted tolerance, the ’’narrow” test 
case yielded about twice as many splits and merges dur¬ 
ing the entire simulation, therefore also twice as many 
passes through the domain boundary edge-filtering. 

We have plotted the differences w.r.t. reference 
when running raw, respectively edge-preserving, k- 
means in Figure 16, for the ’’narrow” (panels ’A’=raw 
and ’B’=edge-preserving) respectively ’’wide” (panels 
’C’=raw and ’D’=edge-preserving) tests. 

The effect of the more frequent splits/merges for the 
’’narrow” case shows that traces of the MPI boundaries 
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AE (ref,raw), in total electromagnetic field energy, and 
the negative difference, Ai?(edge,ref), in total particle 
energy, between the reference and ’’wide” test cases. 
Although the drift in EM energy is relatively large over 
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FIG. 16. Test of automatic merging and splitting. (A) and 
(B): procedure employing narrow tolerance range for the cases 
of ’’raw” and ’’edge-preserving” cases, respectively. (C) and 
(D): procedure employing wide tolerance range for the cases 
of ’’raw” and ’’edge-preserving” cases, respectively. The mock 
edge-preserving method is clearly superior. 
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FIG. 17. Drifts in electromagnetic field energy (blue and red) 
and total negative particle energy (green, offset by -0.0001 
for clarity) in the ’’wide” automated test case, using either a 
’’raw ” (blue ) or an ’’edge-preserving” (red) methods (see sec- 


IIA 4| on edge preserving mock-up procedure). Inset: 


tion 

total EM energy in the simulation volume as function of iter¬ 
ations. 


are visible in both the raw and edge-preserving cases. 
Yet, the quality of the edge-preserving scheme is still su¬ 
perior by a factor ~2-5, i.e. 


[A p„ 


A p, 


min\RAW 


[Apm ax A Pmi n \ EDGE 


2 - to - 5 


(16) 


For the ’’wide” tolerance case, the more infrequent 
need for splits or merges yields improvement in the 
handling of edge effects at domain boundaries. Here, 
again, the edge-preserving scheme is justified, but now 
much better, namely by a factor 5-to-10 improvement 
(calculated as in equation 16 


time (about 1 • 10 -4 /2 • 10 -3 ~ 0.05 = 5.0%), this energy 
drift is compensated by an anti-correlated drift in the 
total particle energy (see Figure [l7|). The drift in total 
particle energy is partially due to the convex artifacts of 
k-means, operating in momentum space - which leads to 
artificial cooling, see also discussion in section on tests. 
The total energy deviates less than 0.5% from reference 
in the case of wide tolerance range at the end of the sim¬ 
ulation, after more than 200 merges and splits over more 
than 2000 iterations. 


For the remainder of this section we concentrate on the 
’’wide” tolerance case. More than 2500 iterations, 100 
merges, and 100 splits, were performed (~6 splits and 
^6 merges for each MPI domain). The splitting/merging 
kicks in at approximately the end of linear instability 
growth; this is expected since growth of current filaments, 
and charge separation results in concentrations of com¬ 
putational particles at that approximate time. 

Energy conservation 

Energy conservation and momentum conservation is 
striking, when we also look to the preservation of struc¬ 
tures — both spectral and spatial. In Figure [T7| is plot¬ 
ted the difference, AE (ref, edge) = E rc [(t) — E e a ge (t), 


Momentum conservation 

Likewise, we can study the total momentum evolu¬ 
tion, but it would yield little new information. Rather, 
it makes sense to look at a ’’critical” component of the 
momentum; the ion beam momentum in the streaming 
direction. We have plotted the time evolution for this 
quantity in Figure [l8j It is unnecessary to plot any 
other histograms in phase subspace since all directions 
are equally valuable in the k-means optimization proce¬ 
dure; they will show comparative accuracy. There is a 
slight shift of the k-means treated runs (red curve) in 
the histograms as time progresses. We interpret this in 
connection with the conclusions concerning energy as a 
loss of energy transfer between particles and fields, which 
leads to a slower slow-down of the ion beam. 
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FIG. 18. Sample histogram of momentum space, ion beam 
species, along streaming direction, {P z , N p (P z )}, for various 
times from just prior (upper left panel) to first merge, until 
about 1100 iterations (~48cjpe) later. 


Particle weight distribution evolution 

Figure [19] plots the evolution of particle weights 
(all particles) as the simulation progresses. From an 
initial constant weight, Wi n u = 0.3, weights become 
distributed in a uniform manner over a wide range of 
values, as new generations of particles appear due to 
merging and splitting. This is desirable in terms of 


f(w(t)), ion beam 



FIG. 19. Time evolution of particle weights, ion beam. Red 
dashes show the initial weight of all particles. Black curve fi¬ 
nal weights distribution at t=10.0 (#it = 2553). Dark-to-light 
colored curves correspond to early-to-late time distributions, 
over time intervals of 128 iterations. 


statistical evolution; phase space information is now also 
spread over a wide range in weights, and not only in 
phase space. We can more safely destroy particles at 


random without risking serious biasing effects on the 
physics in the process. For example for Monte Carlo 
modeling of collisional processes, a larger sampling space 
is available if more particles with smaller weights (and 
difeerent positions in 6D phase space) are available for 
the scattering process. Another advantage is that nearly 
empty, or at least very low-density, regions will also 
be more densely populated with particle phase space 
information. 


f(x,z), beam ions, EDGE, t= 10.0018 
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FIG. 20. Phase subspace { x , z, Logio(w)}, for illustrative pur¬ 
poses. Yellow dots are late generation particles, size shows 
weight, while blue particles show concurrent reference run. 
Minor discrepancies in the local number density is due to a 
1/100 stridden sampling — and, to some degree, the collapse 
of phase space from 6D to 2D. This plot can be compared 
directly with Figures [21] and [I9| 


Figure [2l] compares the beam ion density at the very last 
time step, after more than 2500 iterations and more than 
100 splits and 100 merges. The result demonstrates that 
the solution stays stable for rather long times. There is 
a tendency for the particles to clump due to the frequent 
merges; the edge-preserving performs slightly better in 
avoiding clumps, and better preserves large scale struc¬ 
ture and flow — by a marginal measure. 


IV. DISCUSSION & CONCLUDING REMARKS 


It is important to realize that our k-means compres¬ 
sion and inflation of CMP data in PIC codes is a global 
method, which ’feels’ all modes present in the volume 
at hand. Thus, the method will preserve all modes in¬ 
creasingly well for decreasing wavenumbers considered. 
This contrasts methods outlined during the Introduc¬ 
tion, which conserve the physics locally , rather. We 
briefly discuss here some obvious limitations of the k- 
means scheme. We then discuss how these limitations 
may not be too severe, considering a range of applica¬ 
tions for which a global domain method can be utilized. 
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FIG. 21. Comparison between three runs of ’’Barbara’* in 
the ’’wide” test case. Contour plots show beam ion density, 
during automatic merging and splitting on the wide tolerance 
interval. Approximately 100 merging and 100 splitting events 
occurred in total on the 16 MPI domains, for an average of 
about 12 splitting and merging events per domain. We have 
outlined the lowest level contour (thin white lines) to aug¬ 
ment the finest level differences clearly. Number of contour 
levels is 256, max(pb,0 ~ 21.0, min (pb,i) ~ 0.0. Left panel: 
raw k-means, no edge filtering. Middle panel: reference simu¬ 
lation (no k-means), Right panel: edge filtered k-means. The 
edge-preserving method does perform slightly better, but the 
differences are very small. 


Applicability limitations on domain sizes 

There is a trade-off between the size of domains on 
which the method is deployed (here sizable MPI domains) 
and the speed at which the procedure can be executed. 

bkmeans —> Vce\\ : In this case, our model approaches the 
full solution pre-merge/-split. We cannot go to cell 
sized domains due to the convexity issue, which 
then becomes either detrimental to conservation of 
the physics, or becomes nullified due to retainment 
of the full training vector set in question. 

bkmeans —> VtotaF In this case, the edge effects ap¬ 
proaches a negligible contribution. We can encom¬ 
pass the entire domain; even though super-linear 
scaling becomes prohibitive for performance this 
is only a practical limitation, not a physical one. 
Also, the compression ratio parameter is relieved of 
any restrictions imposed by convexity in this limit. 

While the latter limit can be remedied by accelerated 
algorithms or hardware acceleration (see below), the for¬ 
mer cannot. We are bound by a lower limit on domain 
volumes to obtain a reasonable compression factor. The 
domain volume boundary thickness, Wbound? must be 
small compared with the domain interior, along all di¬ 
mensions. W bound = lAx « L x = 32 Ax (per MPI 
domain) in our study above. Still, a domain volume gran¬ 
ularity leading to W bound = l Ax < L x = [a few} • Ax 
should be possible. 


Demands for memory 

A serious drawback of the current implementation is 
memory overhead; worst case, we need simultaneous stor¬ 
age for the codebook, as well as training-to-codebook 
vector mapping of particle IDs, and count of training 
vectors-per-codebook vector. The total overhead then 
becomes (7[real] + 7 /s [integer])N opt , which should be 
compared with the normal need for a PIC representa¬ 
tion (when not employing merging/splitting) of simply 
7[real]N opt particles: an overhead which doubles the 
memory need. 

Careful implementation and re-use of allocated space 
can reduce the need for memory, but at present we 
see no way to remove the need for storage of the full 
codebook vector data set for the ’k-means’ sceheme. 
This could improve in the future. 

One quite obvious way to remove the problem of overhead 
in connection with the codebook construction might be 
to replace the weighted ’k-means’ step with a weigthed 
’k-medoids’ scheme. Training vectors are then taken as 
codebook centers, thus re-cycling previously allocated 
memory in an elegant and efficient way. We are currently 
investigating whether ’k-medoids’ will also perform ade¬ 
quately with respect to the physics — this is not given 
a priori It is of similar computational complexity as 
’k-means’. 


Acceleration of K-Means clustering 

The standard Lloyd’s k-means is too slow, even pro¬ 
hibitively so. We cannot obtain a process which is faster 
than about 0(M x K x D x i), with D the dimensionality 
(which is 6 for 3D3V) and i is the number of iterations 
to convergence. For our test case, the time spent in 
k-means calculations exceeded the entire simulation time 
by several factors. An accelerated method is clearly 
needed. To gain sufficient speed in the computation, 
we need an approximate factor of 40 in speed-up. This 
is only possible using more efficient algorithms or more 
efficient hardware, for the same problem size. 

In a separate project, we have investigated acceleration 
by introducing a KD-Tree method, which seems promis¬ 
ing when keeping calculations on CPU. Hardware accel¬ 
eration is also an option, since Lloyd’s k-means is ”em- 
barrasingly parallel”; a GPGPU kernel (in OpenCL) was 
implemented, with considerable speed-up, on-core. How¬ 
ever, the copying of data to-and-fro the GPGPU is costly, 
and only for large data sets does the hardware accelerated 
method become feasible. The question of algorithmically 
accelerated versus hardware accelerated k-means will be 
treated in a subsequent publication (Maly et al.^, in 
prep). 

For our ’’BARBARA” test, a typical accelerated k-means 
step executed about as fast as a typical simulation 
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timestep, in the case of the KD-Tree algorithm, making 
the procedure competitive for several important ap¬ 
plications (see Introduction). The GPGPU performed 
comparably, in some cases better than KD-Tree, but has 
only been benchmarked in stand-alone tests (on large 
6D particle data sets). 

With this article we have given an account for a global 
k-means based phase space compression method. For 
PIC codes we have evaluated the physical conservation 
properties without a glance to computational effort. In a 
separate publication (Maly et al.^, in prep) the question 
of computational feasibility is being addressed in two 
different ways; by algorithmic acceleration (KD-Tree 
algorithm optimization with MPI+OpenMP support), 
and by hardware acceleration (Lloyd’s brute force on 
GPGPU). 

For a small laser-plasma interaction case, with electron 
bunching (production run example; see Figure [l]), we 
tested the KD-Tree accelerated k-means for 2, 4 and 8 
OMP threads per MPI domain (total of 16-by-{2, 4, 8} 
threads). 


0MP NUM THREADS 

2 

4 

8 

Ref 

Tend H (wall) 

5296 

5319 

5782 

6308 


Runtimes were compared with a reference run where, 
again, no particle control were active. For these cases, 
the runtimes are given in table ??, where we see that 
the 0MP_NUM_THREADS=8 performed most favorably, giv¬ 
ing a reduction in runtime of about 15%. This speed¬ 
up will increase with increasing simulation size, therefore 
increasing particle number density constrasts. The qual¬ 
ity of the resulting structures/energy conservation has 
been verified, and matches very well the ’barbara’ results 
quoted here. 

Both methods, KD-Tree and GPGPU (latter not 
quoted here) acceleration, are indeed affirmative towards 
using our k-means procedure for realistic problems. 

Generality of K-Means Merging & Splitting 

Particle phase space compression and inflation is not 
limited to electromagnetic PIC codes. Any system which 
can be modeled by a discretized particle distribution 
function, /(r(t), £), in a D-dimensional space ( r(t ) G M D ) 
can be manipulated by the k-means optimization scheme 
described in this article. This means that 

1. the number of particles must be high enough to 
consider the particle population(s) as approximat¬ 
ing a continuous distribution within a given volume 
selected for merging, and 

2. that the intrinsic noise in the non-reduced solution 
must outweigh the noise introduced by the reduc¬ 
tion of the particle data set. 


The specific values for these constraints are of course 
problem dependent, the determination of which are be¬ 
yond the scope of this paper. This question is deferred 
to future work. 



FIG. 22. Global maximum CMP number density, max[A ce ii(t)] 
in a stand-alone test of the k-means merging procedure on a 
PIC code used for simulating accretion of planetesimals in 
planet formation. Black: original particle set (2.4 million 
CMPs). Blue: reduced set (1.2 million CMPs). 


Nonetheless, to give an example here, we have verified 
the method in one othe r cas e of a particle-in-cell based 
code, by Johansen et a/P ^ which has shown promising 
results as well. At a well evolved point in time, in a sim¬ 
ulation of the formation of streaming instabilities in a 
proto-planetary disk, a total of 2.4 million particles were 
merged into 1.2 million, and the simulation was restarted 
with the reduced codebook solution. Figure [22] shows a 
comparison between a reference simulation (black) and 
the compressed phase space simulation (yellow). The 
quantity plotted is the (global) maximum CMP number 
density, max[7V ce n(t)], which is a very sensitive measure. 
Coherence is almost perfect after 25 iterations and still 
reasonable after almost 100, even for a reduction by half 
the particle ensemble. The instance of our k-means algo¬ 
rithm used in this stand-alone test was an early stage im¬ 
plementation, originally employing the Linde-Buzo-Gray 
algorithm?^, rather than Lloyd-Forgy’s. Also, we did not 
employ our edge-preserving procedure. The agreement 
plotted here is likely to improve in the future. 

Concluding Remark 

Without the ability to make a direct comparison by 
benchmark, it is difficult to assess the range of valid¬ 
ity, conservation abilities, computational cost, and pos¬ 
sible domain boundary constraints, of the methods men¬ 
tioned in the Introduction. When we consider the prob¬ 
lem from a global PIC domain perspective, the issue of 
memory overhead associated with performing an analyt¬ 
ical match of charge and current densities (without the 
tensor conservation) could possibly approach that asso¬ 
ciated with our k-means procedure when M < K. We 
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therefore encourage a comparison of me rging methods, in 
particular between more recent methods® 71 , and the sta¬ 
tistical k-means based optimization scheme introduced in 
the present article. 

As kindly pointed to also by one of our reviewers, 
k-means sometimes tends to create clusters that are 
more uniform than the original distribution 29 ! This 
may challenge clustering-based merging algorithms 
when considering anisotropic or asymmetric distribution 
functions. Therefore the bump-on-tail problem 2 ^ could 
be included in a suite of common benchmarks for future 
comparisons of merging algorithms (see also Concluding 
remark, sec. IV). 


The clustering merge/split code used for this publication 
may be requested by emailing the author 25 . 
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